Segmentation of speech using speaker identification

نویسندگان

  • Lynn Wilcox
  • Francine Chen
  • Don Kimber
  • Vijay Balasubramanian
چکیده

This paper describes techniques for segmentation of conversational speech based on speaker identity. Speaker seg-mentation is performed using Viterbi decoding on a hidden Markov model network consisting of interconnected speaker sub-networks. Speaker sub-networks are initialized using Baum-Welch training on data labeled by speaker, and are iteratively retrained based on the previous segmentation. If data labeled by speaker is not available, agglomerative clustering is used to approximately segment the conversational speech according to speaker prior to Baum-Welch training. The distance measure for the clustering is a likelihood ratio in which speakers are modeled by Gaussian distributions. The distance between merged segments is recomputed at each stage of the clustering, and a duration model is used to bias the likelihood ratio. Segmentation accuracy using agglomorative clustering initialization matches accuracy using initialization with speaker labeled data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Approach of Speaker Diarization by Gaussian Mixture Model (GMM)

Speaker identification is an important activity in the process of speaker diarization. We need to model the speaker by Gaussian mixture model (GMM) for speaker identification purpose. Large GMM is called as a Universal Background Model (UBM) which is adapted into each speaker model for speaker identification purpose. This paper focuses on speech clustering for speaker diarization. The speaker d...

متن کامل

Visual speech segmentation and speaker recognition for transcription of TV news

This paper is about a method for visual segmentation of TV news. The TV news shows are segmented according to the visual stream from the video TV recordings in this method. Human faces are found in the single visual segments with the help of the fast algorithm for face detection. The found faces are compared with the visual GMMs, that have been trained from the video picture of the single broad...

متن کامل

Multimodal Speaker Segmentation and Identification in Presence of Overlapped Speech Segments

We propose a multimodal algorithm for speaker segmentation and identification with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel likelihood model for the microphon...

متن کامل

Segmentation of speech for speaker and language recognition

Current Automatic Speech Recognition systems convert the speech signal into a sequence of discrete units, such as phonemes, and then apply statistical methods on the units to produce the linguistic message. Similar methodology has also been applied to recognize speaker and language, except that the output of the system can be the speaker or language information. Therefore, we propose the use of...

متن کامل

STON: Efficient Subtitling in Dutch Using State-of-the-Art Tools

We present a modular video subtitling platform that integrates speech/non-speech segmentation, speaker diarisation, language identification, Dutch speech recognition with state-of-the-art acoustic models and language models optimised for efficient subtitling, appropriate preand postprocessing of the data and alignment of the final result with the video fragment. Moreover, the system is able to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994